Introduction
Cardiotocograms, also known as CTGs, have been instrumental within clinical medicine for a long time. Obstetricians use these measurements and classifications to obtain detailed information and intelligence about newborns and their mother prior and during labor. In 2018, an article presented through the Journal of Clinical Medicine detailed the practicality of CTG. The same article noted that interpretations of these censorial readings is mainly attributed to the observer; which creates challenges of consistency of interpretations and defies the human naked- eye. Questions like what happens if/when the interpreter misses a key detail, or what could be the meaning of a combination of diagnostic signals, furthermore, what time-sensitive conditions may these measurements expose, requiring immediate actions? These are few examples of concerns posed by the continuous practice of merely optical assessments of a CTG. (Zhao, Zhang, and Deng 2018)
The following exploration presents an assessment of CTGs using the conditional inference tree (ctree) model. The same shows how the algorithm expedites and enhances the interpretation of CTG readings while appraising multiple fetal readings simultaneously. Moreover, the study aims to identify potential hidden patters which may require further attention.
Data
The analyzed data comes for the UCI Machine Learning Repository(D. Campos 2000), and it consists of measurements of fetal heart rate (FHR) and other important characteristics as identified and recorded within each cardiotocograms. Ultimately, all CTGs were classified by three subject matter experts, and under unanimity, assigned with response-labels based on the fetal state and/or morphological detected patterns. The following is a list of the variables meaning according to the UCI repository:
- LB - FHR baseline (beats per minute)
- AC - # of accelerations per second
- FM - # of fetal movements per second
- UC - # of uterine contractions per second
- DL - # of light decelerations per second
- DS - # of severe decelerations per second
- DP - # of prolonged decelerations per second
- ASTV - percentage of time with abnormal short term variability
- MSTV - mean value of short term variability
- ALTV - percentage of time with abnormal long term variability
- MLTV - mean value of long term variability Width - width of FHR histogram
- Min - minimum of FHR histogram
- Max - Maximum of FHR histogram
- Nmax - # of histogram peaks
- Nzeros - # of histogram zeros
- Mode - histogram mode
- Mean - histogram mean
- Median - histogram median
- Variance - histogram variance
- Tendency - histogram tendency
- CLASS - FHR pattern class code (1 to 10)
- NSP - fetal state class code (N=normal; S=suspect; P=pathologic)
Exploratory Data Analysis
During exploratory data analysis the data is confirmed as a combination of 2126 observations and 23 variables. The following is a preview of the first six observations after been ingested as as_tibble.
df<-as_tibble(read.csv(file="cardiotocography.csv", head=TRUE, sep=",", as.is=FALSE))
print(df, n=6)# A tibble: 2,126 × 23
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl> <int> <int>
1 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62
2 132 0.006 0 0.006 0.003 0 0 17 2.1 0 10.4 130 68
3 133 0.003 0 0.008 0.003 0 0 16 2.1 0 13.4 130 68
4 134 0.003 0 0.008 0.003 0 0 16 2.4 0 23 117 53
5 132 0.007 0 0.008 0 0 0 16 2.4 0 19.9 117 53
6 134 0.001 0 0.01 0.009 0 0.002 26 5.9 0 0 150 50
# ℹ 2,120 more rows
# ℹ 10 more variables: Max <int>, Nmax <int>, Nzeros <int>, Mode <int>,
# Mean <int>, Median <int>, Variance <int>, Tendency <int>, CLASS <int>,
# NSP <int>
The following code chunks portray a basic assessment of specific attributes and areas of importance such as variability of observations, presence of missing values, mean, standard deviation,
Min. 1st Qu. Median Mean 3rd Qu. Max.
106.0 126.0 133.0 133.3 140.0 160.0
Note: LB attribute’s IQR equals 14, which is significantly small indicating a most values to be clustered around the middle. The following histogram confirms the small IQR.
hist(df$LB, breaks = 12, main="Histogram of FHR Baseline", xlab="(beats per minute)",
border="darkblue",col ="lightgrey", labels = F) LB AC FM UC DL DS DP ASTV
0 0 0 0 0 0 0 0
MSTV ALTV MLTV Width Min Max Nmax Nzeros
0 0 0 0 0 0 0 0
Mode Mean Median Variance Tendency CLASS NSP
0 0 0 0 0 0 0
t.test(df$LB)
One Sample t-test
data: df$LB
t = 624.59, df = 2125, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
132.8853 133.7224
sample estimates:
mean of x
133.3039
# LB stats
m<-mean(df$LB)
std<-sd(df$LB)
upr=m+std
lwr=m-std
lbdf <- data.frame(df,my_x = 0 + rnorm(length(df$LB),
mean=m, sd=std),my_y = 0 + rnorm(length(df$LB), mean=m, sd=std))
# LB Variation
print(pltlb <- ggplot(lbdf, xlab = F, aes(x=(my_x), y=my_y)) +
geom_line(col="grey51",linemitre=1) +
geom_smooth(method=lm , color="blue", lty=3, fill="light blue", se=T) +
labs(x=NULL, y="BPM", title="FHR LB Variation\nIn Relation To The Mean")+
theme_ipsum())# very first graph representation with manual boundary calculations
upr2=m+(std*2)
lwr2=m-(std*2)
# Plot LB distribution boundaries
plot.new()
plot(df$LB, type="l", col="grey51", ylab="LB", main="1 & 2 Standard Deviations")
abline(h = m, col = "blue")
abline(h = upr, col = "orange", lty=2)
abline(h = lwr, col = "orange", lty=2)
abline(h = upr2, col = "red", lty=2)
abline(h = lwr2, col = "red", lty=2)
text(-65,134, "mean:133.30", col = "blue", adj = c(0, -.1))
text(-65,upr, round(upr, 2), col = "black", adj = c(0, -.1))
text(-65,lwr, round(lwr, 2), col = "black", adj = c(0, -.1))
text(-65,upr2, round(upr2, 2), col = "black", adj = c(0, -.1))
text(-65,lwr2, round(lwr2, 2), col = "black", adj = c(0, -.1))# LB Observations higher than 2-s.d.
lba<-(sum(df$LB >152.99)) #39
# LB Observations lower than 2-s.d.
lbb<-(sum(df$LB <113.62)) #44
lba+lbb #=83 obs outside of 2-s.d.[1] 83
[1] 0.9609595
# Exclude non-original measurements, rename targeted values
df[12:22] <- NULL
df$NSP<-as.numeric(df$NSP)
# enumeration of labels with the factor function
df$NSP<-factor(df$NSP, levels= 1:3, labels = c("Normal","Suspect", "Pathologic"))# Visualization of original NSP
plot(df$NSP, main="Original NSP Distribution",
xlab="Fetal State Classification",
ylab="Frequency", col=c(3, 7, 2))
text(df$NSP, labels=as.character(tabulate(df$NSP)), adj=3, pos=3)# additional way to preview distribution of attributes
# distributions preview
df[,1:12] %>%
gather() %>%
ggplot(aes(value)) +
theme_light() + labs( title="FHR Measurement Distributions")+
theme(axis.text.x = element_text(angle=90)) +
facet_wrap(~ key, scales = "free", shrink = TRUE) +
geom_bar(mapping = aes(value),
color="darkblue", fill="lightgrey")In progress …
# Summary of DF after encoding the label vector as numbers.
summary(df) LB AC FM UC
Min. :106.0 Min. :0.000000 Min. :0.000000 Min. :0.000000
1st Qu.:126.0 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.002000
Median :133.0 Median :0.002000 Median :0.000000 Median :0.004000
Mean :133.3 Mean :0.003178 Mean :0.009481 Mean :0.004366
3rd Qu.:140.0 3rd Qu.:0.006000 3rd Qu.:0.003000 3rd Qu.:0.007000
Max. :160.0 Max. :0.019000 Max. :0.481000 Max. :0.015000
DL DS DP ASTV
Min. :0.000000 Min. :0.000e+00 Min. :0.0000000 Min. :12.00
1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:32.00
Median :0.000000 Median :0.000e+00 Median :0.0000000 Median :49.00
Mean :0.001889 Mean :3.293e-06 Mean :0.0001585 Mean :46.99
3rd Qu.:0.003000 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:61.00
Max. :0.015000 Max. :1.000e-03 Max. :0.0050000 Max. :87.00
MSTV ALTV MLTV NSP
Min. :0.200 Min. : 0.000 Min. : 0.000 Normal :1655
1st Qu.:0.700 1st Qu.: 0.000 1st Qu.: 4.600 Suspect : 295
Median :1.200 Median : 0.000 Median : 7.400 Pathologic: 176
Mean :1.333 Mean : 9.847 Mean : 8.188
3rd Qu.:1.700 3rd Qu.:11.000 3rd Qu.:10.800
Max. :7.000 Max. :91.000 Max. :50.700
# output the tree structure
# print(model)
model[4]
Model formula:
NSP ~ LB + AC + FM + UC + DL + DS + DP + ASTV + MSTV + ALTV +
MLTV
Fitted party:
[4] root
| [5] ASTV <= 73
| | [6] DL <= 0.008
| | | [7] DP <= 0
| | | | [8] LB <= 149
| | | | | [9] AC <= 0.001
| | | | | | [10] UC <= 0: Normal (n = 34, err = 14.7%)
| | | | | | [11] UC > 0: Normal (n = 231, err = 3.5%)
| | | | | [12] AC > 0.001: Normal (n = 626, err = 0.3%)
| | | | [13] LB > 149: Normal (n = 17, err = 35.3%)
| | | [14] DP > 0
| | | | [15] MLTV <= 0.9: Normal (n = 7, err = 28.6%)
| | | | [16] MLTV > 0.9
| | | | | [17] MLTV <= 8.8: Normal (n = 28, err = 0.0%)
| | | | | [18] MLTV > 8.8: Normal (n = 7, err = 28.6%)
| | [19] DL > 0.008
| | | [20] ASTV <= 58: Normal (n = 35, err = 0.0%)
| | | [21] ASTV > 58: Pathologic (n = 25, err = 40.0%)
| [22] ASTV > 73: Pathologic (n = 11, err = 45.5%)
Number of inner nodes: 9
Number of terminal nodes: 10
#8. visualize the tree
# plot(model, main="Cardiotocography Data\n Conditional Inference Tree\n'Extended'",
# type="simple",ep_args = list(justmin = 8), drop_terminal = F,
# gp = gpar(fontsize = 9), margins = c(4,4, 4, 4))
plot(model, type="extended", ep_args = list(justmin =8), drop_terminal=F, tnex=1.5,
gp=gpar(fontsize = 8, col="dark blue"),
inner_panel = node_inner(model, fill=c("light grey","cyan"), pval=T),
terminal_panel=node_barplot(model, fill=c(3,7,2), beside=T, ymax=1, rot = 45,
just = c(.95,.5), ylines=F, widths = 1, gap=0.05, reverse=F, id=T),
margins = c(3,0, 3, 0),
main ="Cardiotocography Data\n Conditional Inference Tree\n'Extended'") ACTUAL
PREDICTED Normal Suspect Pathologic
Normal 1168 70 4
Suspect 10 123 1
Pathologic 17 8 122
# predicted classification accuracy with training data
sum(predict(model) == train.data$NSP)/length(train.data$NSP)[1] 0.9277741
prop.table(table(predict(model), train.data$NSP, dnn=c("PREDICTED", "ACTUAL"))) ACTUAL
PREDICTED Normal Suspect Pathologic
Normal 0.7669074196 0.0459619173 0.0026263953
Suspect 0.0065659882 0.0807616546 0.0006565988
Pathologic 0.0111621799 0.0052527905 0.0801050558
#10. Evaluate the model on a test data
model2 <- ctree(myFormula, data = test.data)
model2[4]
Model formula:
NSP ~ LB + AC + FM + UC + DL + DS + DP + ASTV + MSTV + ALTV +
MLTV
Fitted party:
[4] root
| [5] DP <= 0
| | [6] DL <= 0.01
| | | [7] ALTV <= 1: Normal (n = 291, err = 1.0%)
| | | [8] ALTV > 1
| | | | [9] LB <= 143
| | | | | [10] ASTV <= 59: Normal (n = 78, err = 0.0%)
| | | | | [11] ASTV > 59: Normal (n = 14, err = 42.9%)
| | | | [12] LB > 143
| | | | | [13] ASTV <= 45: Normal (n = 7, err = 14.3%)
| | | | | [14] ASTV > 45: Suspect (n = 13, err = 15.4%)
| | [15] DL > 0.01: Normal (n = 10, err = 30.0%)
| [16] DP > 0
| | [17] MSTV <= 1.7: Normal (n = 12, err = 25.0%)
| | [18] MSTV > 1.7: Normal (n = 12, err = 50.0%)
Number of inner nodes: 7
Number of terminal nodes: 8
# plot(model2, main="Cardiotocography Data\n Simple Conditional Inference Tree\nby ocardec",
# type="simple",ep_args = list(justmin = 10), drop_terminal = F, gp = gpar(fontsize = 12))
plot(model2, ep_args = list(justmin = 8), type="extended", drop_terminal = F,
tnex=1, gp= gpar(fontsize = 8, col="dark blue"),
inner_panel = node_inner (model2, fill=c("lightgrey","yellow"), pval=T, id=T),
terminal_panel=node_barplot(model2, col="black", fill=c(3,7,2, 0.3), beside=T,
ymax=1, rot = 45, just = c("right", "top"), ylines=F,
widths=1, gap=0.1, reverse=F, id=F), margins = c(3, 0, 3, 0),
main="Cardiotocography Data\n Extended Conditional Inference Tree\nby ocardec")# Confusion matrix and stats
testPred2 <- predict(model2, newdata = test.data, method="NSP")
confusionMatrix(testPred2, test.data$NSP)Confusion Matrix and Statistics
Reference
Prediction Normal Suspect Pathologic
Normal 449 21 12
Suspect 9 73 4
Pathologic 2 0 33
Overall Statistics
Accuracy : 0.9204
95% CI : (0.8958, 0.9407)
No Information Rate : 0.7629
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.7809
Mcnemar's Test P-Value : 0.001165
Statistics by Class:
Class: Normal Class: Suspect Class: Pathologic
Sensitivity 0.9761 0.7766 0.67347
Specificity 0.7692 0.9745 0.99639
Pos Pred Value 0.9315 0.8488 0.94286
Neg Pred Value 0.9091 0.9594 0.97183
Prevalence 0.7629 0.1559 0.08126
Detection Rate 0.7446 0.1211 0.05473
Detection Prevalence 0.7993 0.1426 0.05804
Balanced Accuracy 0.8727 0.8755 0.83493